Tag
6 articles
NVIDIA's Gated DeltaNet-2 decouples erase and write operations in linear attention, outperforming models like Mamba-2 and KDA in long-context tasks.
Learn how Lighthouse Attention speeds up AI training on long inputs by selectively focusing on important information, without sacrificing accuracy.
Learn how to set up and use FlashKDA, an open-source high-performance implementation of Kimi Delta Attention from Moonshot AI, for accelerating attention computation in large language models.
This article explains how Xiaomi's MiMo-V2.5 models achieve frontier-level AI performance with significantly lower token costs, focusing on agentic AI, token efficiency, and advanced optimization techniques.
Learn how TriAttention, a new AI method, compresses memory in large language models to make them 2.5x faster without losing accuracy.
Learn to build a hybrid neural network architecture that combines attention mechanisms with convolutional layers, similar to Liquid AI's LFM2-24B-A2B model, to address scaling bottlenecks in large language models.